-
Notifications
You must be signed in to change notification settings - Fork 3.3k
support cuda 13.0 and trtllm kernel #9495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support cuda 13.0 and trtllm kernel #9495
Conversation
0621958 to
e85f927
Compare
FlamingoPg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, wait for CI
|
Followup: #9680 |
|
@rainj-me how do you build the sgl-kernel please? simple "make build" ? I'm getting error /workspace/sglang/sgl-kernel/build/_deps/repo-mscclpp-src/include/mscclpp/atomic_device.hpp:10:10: fatal error: cuda/atomic: No such file or directory (using the same procedure you wrote) |
|
This break the latest build on b200 cu128, I'll revert this first. @rainj-me |
This must be in cuda 13, it is gb300 |
try to use the following command to build |
this errors occurs because you are not pointing well to the correct path: export CPLUS_INCLUDE_PATH=/usr/local/cuda-13.0/targets/sbsa-linux/include/ccclthat resolves the problem for me in gh200.
https://developer.nvidia.com/blog/whats-new-and-important-in-cuda-toolkit-13-0/ |
thank you - is it worth to try trt llm kernels if I have sm120 architecture (which is RTX PRO 6000 ) FP8 blockwise mostly or compressed FP8 scale |


Motivation
#9490
Test
diff --git a/python/pyproject.toml b/python/pyproject.toml index c23efbc2e..b29789d45 100644 --- a/python/pyproject.toml +++ b/python/pyproject.toml @@ -49,7 +49,7 @@ runtime_common = [ "scipy", "timm==1.0.16", "tiktoken", - "torchao==0.9.0", + "torchao==0.12.0+git", "transformers==4.55.2", "uvicorn", "uvloop", @@ -59,21 +59,19 @@ runtime_common = [ srt = [ "sglang[runtime_common]", "sgl-kernel==0.3.5", - "torch==2.8.0", - "torchaudio==2.8.0", + "torch==2.8.0a0+34c6371d24.nv25.8", "torchvision", "cuda-python", - "flashinfer_python==0.2.11.post3", + "flashinfer_python==0.2.14.post1", ] blackwell = [ "sglang[runtime_common]", "sgl-kernel", - "torch==2.8.0", - "torchaudio==2.8.0", + "torch==2.8.0a0+34c6371d24.nv25.8", "torchvision", "cuda-python", - "flashinfer_python==0.2.11.post3", + "flashinfer_python==0.2.14.post1", ]Modifications
Accuracy Tests
Benchmarking and Profiling
Checklist